Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup timestamp-query example #476

Merged
merged 3 commits into from
Nov 19, 2024
Merged

Conversation

greggman
Copy link
Collaborator

@greggman greggman commented Nov 15, 2024

This was a great example. I started this change as just removing a console.log that seemed like it shouldn't be there but then I noticed the code didn't actually handle the case when 'timestamp-query' doesn't exist and ended up making a bunch of changes.

I didn't mean to step on anyone's toes. Here's the change though.

  1. Remove console.log from timestamp query sample

    This was just filling all of memory with strings. Probably not a big deal since no one is likely to run the sample for hours but still....

    Maybe add a <pre> spot that holds the last N queries if you want to see a history?

  2. no need for hasOngoingTimestampReadback

    GPUBuffer already has mapState which tells you this state so no need to track it separately.

  3. There's no need for timestampCount

    It's already on GPUQuerySet.count

  4. code didn't check if timestamps are supported before adding timestamp section to the render pass so the sample got a validation error if timestamp-query was not avaialble.

    Note: you can test this with the webgpu-dev-extension. Type 'timestamp-query' into the "Block Feature" field and reload the page.

  5. Refactor not to use readAsync

    This seems like a problematic API. You call readAsync and if the map is pending then your callback is ignored. Ideally, if a callback was the right API design, then every callback you pass should get called and if it's pending then it would need to add your callback to a list of callbacks to be called when it's ready.

    Further, the timestamps passed back are only valid during the callback. If you tried to keep them they'd magically disappear as soon as your callback exited since they're directly the mapped buffer which will be unmapped as soon as you exit.

    Further, I'd argue for this sample at least, all you want is the last timing.

    So, changed the API to just give you the last times. No callbacks.

  6. It's not clear what TimestampQueryManager was really managing.

    It wasn't adding the timestamps for you. You had to add them yourself and pull out the timestamp query object from internal fields inside TimestampQueryManager.

    Further, because you were adding them yourself you'd have to check yourself, if queries were supported then add them.

    Then, beacuse you were adding them yourself it was up to you to choose start and ends indices.

    It was also passing out BigInt but BitInt is hard to use correctly. I'd argue you don't need it. IIUC, Number.MAX_SAFE_INTEGER is 104 days of nanoseconds so it's fine to convert and give the user something easy to work with.

    So, I changed it to take a pairId and then use that to set both start and end. That way it can auto-subtract the pairs and give you Number. The user doesn't have to deal with BigInt. It still has the issue the timestamps a only kind of valid during the callback but that's not really the contract. Rather, they are always the last read value but at least they don't go to undefined if you happened to save timestamp?

Other: I considered even getting rid of pairId and have addTimestampWrite just auto increment the indices to use. It would then reset to 0 when you call update. With that, it could easily add up the times and give you the total time across passes. That way you wouldn't have to manage the pairIds yourself. On the otherhand, if you wanted individual pass timings you need to know which pass got which pairId. It could pass the pairId back from addTimestamp but I was less sure about that change.

Anyway, I hope this is considered an improvement.

One other thing, I don't understand PerfCounter. If you get one bad outsized timing it seems like it will take an extremely long time to recover though maybe I don't understand the math.

This was a great example. I started this change as just
removing a `console.log` that seemed like it shouldn't
be there but then I noticed the code didn't actually
handle the case when 'timestamp-query' doesn't exist
and ended up making a bunch of changes.

I didn't mean to step on anyone's toes. Here's the
change though.

1. Remove `console.log` from timestamp query sample

   This was just filling all of memory with strings.
   Probably not a big deal since no one is likely to
   run the sample for hours but still....

   Maybe add a `<pre>` spot that holds the last N queries
   if you want to see a history?

2. no need for `hasOngoingTimestampReadback`

   `GPUBuffer` already has `mapState` which tells you this state
   so no need to track it separately.

3. There's no need for timestampCount

   It's already on `GPUQuerySet.count`

4. code didn't check if timestamps are supported before adding
   `timestamp` section to the render pass so the sample got a validation
   error if timestamp-query was not avaialble.

   Note: you can test this with the webgpu-dev-extension. Type
   'timestamp-query' into the "Block Feature" field and
   reload the page.

5. Refactor not to use `readAsync`

   This seems like a problematic API. You call `readAsync`
   and if the map is pending then your callback is ignored.
   Ideally, if a callback was the right API design, then
   every callback you pass should get called and if it's pending
   then it would need to add your callback to a list of callbacks
   to be called when it's ready.

   Further, the timestamps passed back are only valid
   during the callback. If you tried to keep them they'd
   magically disappear as soon as your callback exited
   since they're directly the mapped buffer which will be
   unmapped as soon as you exit.

   Further, I'd argue for this sample at least, all you want is
   the last timing.

   So, changed the API to just give you the last times.
   No callbacks.

6. It's not clear what TimestampQueryManager was really managing.

   It wasn't adding the timestamps for you.
   You had to add them yourself and pull out the timestamp
   query object from internal fields inside TimestampQueryManager.

   Further, because you were adding them yourself you'd
   have to check yourself, if queries were supported then
   add them.

   Then, beacuse you were adding them yourself it was up to you
   to choose start and ends indices.

   It was also passing out BigInt but BitInt is hard to use correctly.
   I'd argue you don't need it. IIUC, Number.MAX_SAFE_INTEGER is 104 days
   of nanoseconds so it's fine to convert and give the user
   something easy to work with.

   So, I changed it to take a pairId and then use that
   to set both start and end. That way it can auto-subtract the
   pairs and give you Number. The user doesn't have to deal with
   BigInt.

Other: I considered even getting rid of pairId and have `addTimestampWrite`
just auto increment the indices to use. It would then reset to 0 when you
call `update`. With that, it could easily add up the times
and give you the total time across passes.

That way you wouldn't have to manage the pairIds yourself. On the otherhand,
if you wanted individual pass timings you need to know which pass got which
pairId. It could pass the pairId back from `addTimestamp` but I was less sure
about that change.

Anyway, I hope this is considered an improvement.
Copy link
Collaborator

@kainino0x kainino0x left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments but I think this is good!
FYI @eliemichel

sample/timestampQuery/main.ts Outdated Show resolved Hide resolved
sample/timestampQuery/TimestampQueryManager.ts Outdated Show resolved Hide resolved
sample/timestampQuery/TimestampQueryManager.ts Outdated Show resolved Hide resolved
@beaufortfrancois
Copy link
Collaborator

It would be nice if compute boids sample is similar (either use timestampquery-manager or uses same variables): https://github.com/webgpu/webgpu-samples/blob/main/sample/computeBoids/main.ts

@kainino0x
Copy link
Collaborator

It would be nice if compute boids sample is similar (either use timestampquery-manager or uses same variables): https://github.com/webgpu/webgpu-samples/blob/main/sample/computeBoids/main.ts

Sharing the manager would be good! I'll land this but that can be a potential followup

@kainino0x kainino0x merged commit 530c33a into webgpu:main Nov 19, 2024
1 check passed
@eliemichel
Copy link
Contributor

Thank you @greggman for the thorough feedback and retakes! I admit I am more used to playing with the C API rather than the JavaScript one, and some parts of the example were clearly imperfect.

Sorry I could not reply earlier, but I do not fully agree with the proposed change. I'll try my best to explain (but am off course open to counter arguments!):

It's not clear what TimestampQueryManager was really managing.

The logic of the TimestampQueryManager is to be independent from the processing of the query results: the user says "My pipeline needs N timestamps, and here is what to do with the results (the callback)". I agree that my callback API was pretty bad indeed, it should have been a callback set once for all! But I still find it interesting to be a callback rather than moving the elapsed time computation logic inside the manager: if the user has multiple passes and wants to compute various timestamp difference, that's up to them.

You are effectively moving the behavior of onTimestampReadBack within the manager, making it specific, while otherwise the mechanism is generic (and the interpretation of timestamp indices is up to the user, hence it is manually specified in the pass descriptor (typically, using an enum if there are more than 2 timestamps).

I have been using a C equivalent of this manager for a while, and it is something I want to just copy-paste from one project to another without needing to customizing it all the time: it contains no domain-specific behavior.

NB: This is also why the query count was a user argument, and resolveAll called wuith "all".

One other thing, I don't understand PerfCounter. If you get one bad outsized timing it seems like it will take an extremely long time to recover though maybe I don't understand the math.

If it is just for display it may not be ideal, but a typical use case is to measure perf for a benchmark report, which means giving an equal role to all samples. This just computes the mean/stddev w/o storing all samples.

Furthermore, a proper benchmarking requires to accurately know the number of samples. However, reading the stateful passDurationMeasurementNs rather than getting a callback leads to an incorrect sample count, because addSample is now called at each frame rather than each time a timestamp query gets read back (ofc one should also keep track of which passes were invoked in cases some of them do not run at each frame).

Of course a lot of this is mostly a question of genericity over minimalism and any outcome (your version, my proposition, or any other) would make as much sense in the end (the first version of the example was on the opposite side of the spectrum -- no abstraction at all -- and it was (rightfully) deemed hard to read). I only thinking that if, as suggested, we wanted to reuse this manager in other examples, it could be interesting to make it not-too-specific! Again, open to counter-arguments :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants